Using Domain Similarity for Performance Estimation

نویسندگان

  • Vincent Van Asch
  • Walter Daelemans
چکیده

Many natural language processing (NLP) tools exhibit a decrease in performance when they are applied to data that is linguistically different from the corpus used during development. This makes it hard to develop NLP tools for domains for which annotated corpora are not available. This paper explores a number of metrics that attempt to predict the cross-domain performance of an NLP tool through statistical inference. We apply different similarity metrics to compare different domains and investigate the correlation between similarity and accuracy loss of NLP tool. We find that the correlation between the performance of the tool and the similarity metric is linear and that the latter can therefore be used to predict the performance of an NLP tool on out-of-domain data. The approach also provides a way to quantify the difference between domains.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Case Mix Planning using The Technique for Order of Preference by Similarity to Ideal Solution and Robust Estimation: a Case Study

Management of surgery units and operating room (OR) play key roles in optimizing the utilization of hospitals. On this line Case Mix Planning (CMP) is normally applied to long term planning of OR. This refers to allocating OR time to each patient’s group. In this paper a mathematical model is applied to optimize the allocation of OR time among surgical groups. In addition, another technique is ...

متن کامل

Channel Effect Compensation in OFDM System under Short CP Length Using Adaptive Filter in Wavelet Transform Domain

Channel estimation in communication systems is one of the most important issues that can reduce the error rate of sending and receiving information as much as possible. In this regard, estimation of OFDM-based wireless channels using known sub-carriers as pilot is of particular importance in frequency domain. In this paper, channel estimation under short cyclic prefix (CP) in OFDM system is con...

متن کامل

Channel Estimation and CFO Compensation in OFDM System Using Adaptive Filters in Wavelet Transform Domain

Abstarct In this paper, combination of channel, receiver frequency-dependent IQ imbalance and carrier frequency offset estimation under short cyclic prefix (CP) length are considered in OFDM system. An adaptive algorithm based on the set-membership filtering (SMF) algorithm is used to compensate for these impairments. In short CP length, per-tone equalization (PTEQ) structure is used to avoid i...

متن کامل

Evaluation and Comparison of Topographic Correction Models Is Applied on the Series Landsat Images Using Spectrometery Data

The effect of topography on the radiance record in satellite image, probably reduce the accuracy of algorithem impliementation on the images . Therefore, to reduce the effect of topography, various correction models based on interaction between light and object needs to be defined. This research introduces lombertin correction model (Cosine model) and non_lombertin correction model (mineart and...

متن کامل

Determination of Stability Domains for Nonlinear Dynamical Systems Using the Weighted Residuals Method

Finding a suitable estimation of stability domain around stable equilibrium points is an important issue in the study of nonlinear dynamical systems. This paper intends to apply a set of analytical-numerical methods to estimate the region of attraction for autonomous nonlinear systems. In mechanical and structural engineering, autonomous systems could be found in large deformation problems or c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010